The Github repository of this report can be found here.
This study explores the current state of the musicians listed in Rolling Stone Magazine’s 100 greatest artists compilation. It investigates the endurance of their engagement by analyzing metrics from Spotify. The research quantifies listener engagement using follower counts and popularity scores, hypothesizing that genre influences popularity. Genres like rap and rock are expected to maintain high engagement, while blues and soul, with smaller contemporary audiences, are anticipated to be less popular. The study also examines decades and song features, seeking correlations with ongoing popularity.
The study utilizes data from four sources:
Rolling Stone’s compilation was collected in segments due to the dynamic loading of the website. After the top 50 were loaded, the top 50 were scraped by simulating clicks on the 50-41 ranking range, and then these datasets were combined in ‘greatest_100’.
While Spotifyr package provides convenient access to the Spotify API, additional functions were created to manage the API rate limit effectively. Specifically, the ‘get_artist_info’ is applied to the ‘greatest_100’ using for loop to extract genres, popularity and follower counts. To address inconsistencies in names, an ‘if’ expression was employed. Sub-genres were grouped under main headings, and artists with multiple affiliations were categorized based on the most prominent. To calculate average feature values for tracks, the ‘get_artist_audio_features’ function from Spotifyr package is utilized in sequence to not exceed the rate.
As the API does not offer data regarding decades, the artists’ active years are scrapped from the All Media site. Then it was subsequently processed using regular expressions, and was added to the ‘greatest_100_info’ as a column indicating the debut year of each artist.
To evaluate the present popularity of artists featured in Rolling Stone magazine, how many artists appeared on Billboard Artist chart throughout 2023 was tracked. It’s worth noting that Billboard releases weekly charts rather than monthly ones. Thus, data was specifically extracted from the 3rd week of each month as representative. Also a helper function and for loop were employed to compile the artists commonly present in the Rolling Stone and Billboard rankings.
After data processing, three primary analysis datasets were generated:
‘greatest_100_info’: Contains artists’ names, genres, popularity scores, followers, decades, and magazine rankings. ‘features_vs_popularity’: Includes average values of artists’ music features. ‘billboard_artists’: Subset of ‘greatest_100_info’ for artists in Billboard rankings.
Fig.1: The graph depicts Spotify popularity scores of artists with their corresponding ranks. Each bubble is color-coded based on genre, and the size is determined by the follower count on Spotify.
Spotify provides popularity score for artists, based on the average of plays and how recent those plays are. This combined with follower counts, offer insights into listener engagement. The clustering of artists in the 50-85 popularity range suggests sustained interest in their music. Notably, artists with rap songs, positioned towards the end of the list, have higher average popularity scores, with Eminem leading. Rock artists have an average popularity score of approximately 68.38, while soul, blues, and country artists are among the least popular by the end of 2023.
Fig.2: The box plot shows the popularity range of artists grouped by decade.
When examining the popularity of artists in 2023 based on the decade in which they began their music careers, certain trends emerge. While the magazine compilation includes a larger number of artists who started their music careers in the 1960s and 1970s, it is generally observed that artists tend to become more popular as their debut dates get closer to the present. However, there is also an observation that rock artists who initiated their music careers in the 1960s continue to maintain their popularity in 2023.
Lyrically rich songs attract more listeners with a moderate correlation. Faster tempo songs have lower popularity, with a weak correlation. Energetic tracks with a fast, loud feel are more popular, despite a very low correlation. Danceability doesn’t show a clear correlation, with highly danceable songs not necessarily being more popular, and less danceable songs still achieving significant popularity. Thus, a song’s popularity is primarily influenced by its decade and genre rather than specific Spotify metrics.
Fig.3-7: Graphs illustrate correlations between artists’ popularity change and mean levels of speechiness, tempo, energy, and danceability in their songs.
Fig.8: The line graph illustrates the number of artists featured in the Billboard Artist 100 Chart. (Data is extracted from the third week of each month.)
Billboard Artist 100 charts reveal insights into artists’ endurance. Despite monthly fluctuations, only 24 artists consistently appeared throughout 2023. Reviewing artists on the Billboard list, mostly artists from more latter decades and rock artists debuted in 60s are present in the list in line with the analysis. However, country and blues artists struggle to gain entry.
In conclusion, laterly debuted and artists in contemporary genres such as rock and rap maintain higher popularity levels. Additionally, songs with more lyrics contribute to increased popularity in 2023. Richly lyrical songs also contribute to increased popularity in 2023.
Spotify. (n.d.). Spotify Web API. Retrieved December 27, 2023, from https://developer.spotify.com/documentation/web-api/
Rolling Stone. (2010, December 3). 100 Greatest Artists: The Beatles, Eminem and more of the best of the best. Retrieved December 27, 2023, from https://www.rollingstone.com/music/music-lists/100-greatest-artists-147446/jimi-hendrix-5-30413/
Billboard. (n.d.). Billboard Artist 100. Retrieved December 29, 2023, from https://www.billboard.com/charts/artist-100/
All Music (n.d.) All Music etrieved December 31, 2023, from www.allmusic.com ## Appendix: All code in this assignment
knitr::opts_chunk$set(echo = FALSE) # actually set the global chunk options.
library(spotifyr)
library(tidyr)
library(dplyr)
library(httr)
library(jsonlite)
library(plotly)
library(RSelenium)
library(rvest)
library(ggplot2)
library(stringr)
## Scrape the Rolling Stone's Compilation of the 100 greatest artists
# Start the Selenium server
rD <- rsDriver(browser = "firefox", verbose = FALSE, port = netstat::free_port(random = TRUE), chromever = NULL)
driver <- rD[["client"]]
# Specify the URL of the Rolling Stone page
rank_url <- "https://www.rollingstone.com/music/music-lists/100-greatest-artists-147446/"
driver$navigate(rank_url)
# Find the "Reject All" button and click it
reject_button <- driver$findElement(using = "css", value = "#onetrust-reject-all-handler")
reject_button$clickElement()
# Extract the HTML content of the entire page
page_source <- driver$getPageSource()
# Parse the HTML content with rvest
page <- read_html(page_source[[1]])
# Extract rank and name information
rank_elements <- page %>% html_nodes("span.c-gallery-vertical-album__number")
name_elements <- page %>% html_nodes("h2.c-gallery-vertical-album__title")
# Extract text from HTML nodes
ranks <- html_text(rank_elements)
names <- html_text(name_elements)
# Create a data frame with rank and name
artists_data <- data.frame(
rank = as.integer(ranks),
name = names,
stringsAsFactors = FALSE
)
webElem <- driver$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# Wait for some time to let the page load
Sys.sleep(5)
# Click on the "50 - 41" link to load the next set of artists
driver$findElement(using = "link text", value = "50 - 41")$clickElement()
# Wait for some time after clicking to allow content to load
Sys.sleep(5)
# Extract the HTML content of the page after clicking
page_source_next <- driver$getPageSource()
# Parse the HTML content with rvest for the next set of artists
page_next <- read_html(page_source_next[[1]])
# Extract rank and name information for the next set of artists
rank_elements_next <- page_next %>% html_nodes("span.c-gallery-vertical-album__number")
name_elements_next <- page_next %>% html_nodes("h2.c-gallery-vertical-album__title")
# Extract text from HTML nodes for the next set of artists
ranks_next <- html_text(rank_elements_next)
names_next <- html_text(name_elements_next)
# Create a data frame for the next set of artists
artists_data_next <- data.frame(
rank = as.integer(ranks_next),
name = names_next,
stringsAsFactors = FALSE
)
# Close the RSelenium server
rD[["server"]]$stop()
# Merge the data frames using rbind
greatest_100 <- rbind(artists_data, artists_data_next)
## Define functions to use to access Spotify API and with spotifyr package
# Read the API KEY
readRenviron("~/Desktop/myenvs/spotify.env")
apikey <- Sys.getenv("SPOTIFY_CLIENT_SECRET")
access_token <- get_spotify_access_token()
# Function to get artist information by name
get_artist_info <- function(artist_name) {
access_token <- get_spotify_access_token()
# Spotify API endpoint for artists
artist_endpoint <- paste0("https://api.spotify.com/v1/search?q=",
URLencode(artist_name), "&type=artist&limit=1")
# Make GET request to the Spotify API
response <- GET(
url = artist_endpoint,
add_headers(Authorization = paste("Bearer", access_token))
)
stop_for_status(response)
# Parse JSON response
artist_info <- content(response, "parsed")
# Extract relevant information
artist_data <- data.frame(
name = artist_info$artists$items[[1]]$name,
genres = toString(artist_info$artists$items[[1]]$genres),
popularity = toString(artist_info$artists$items[[1]]$popularity),
followers_total = artist_info$artists$items[[1]]$followers$total,
stringsAsFactors = FALSE
)
return(artist_data)
}
# Function to get artist ID by name
get_artist_id <- function(artist_name) {
access_token <- get_spotify_access_token()
# Spotify API endpoint for artists
artist_endpoint <- paste0("https://api.spotify.com/v1/search?q=",
URLencode(artist_name), "&type=artist&limit=1")
# Make GET request to the Spotify API
response <- GET(
url = artist_endpoint,
add_headers(Authorization = paste("Bearer", access_token))
)
stop_for_status(response)
# Parse JSON response
artist_info <- content(response, "parsed")
# Extract relevant information
artist_id <- artist_info$artists$items[[1]]$id
return(artist_id)
}
function(artist_id, country_code = "US") {
access_token <- get_spotify_access_token()
endpoint <- paste0("https://api.spotify.com/v1/artists/", artist_id, "/top-tracks")
response <- GET(
endpoint,
add_headers("Authorization" = paste("Bearer", access_token)),
query = list(country = country_code, limit = 10),
verbose()
)
stop_for_status(response)
tracks_data <- content(response)$tracks
tracks_df <- data.frame(
name = sapply(tracks_data, function(track) track$name),
popularity = sapply(tracks_data, function(track) track$popularity),
id = sapply(tracks_data, function(track) track$id),
stringsAsFactors = FALSE
)
return(tracks_df)
}
# Create an empty list to store the results
artist_info_list <- list()
# Iterate over each row in greatest_100
for (i in 1:nrow(greatest_100)) {
# Get the artist name and rank from the current row
artist_name <- greatest_100$name[i]
artist_rank <- greatest_100$rank[i]
# Handle the special case for "Parliament and Funkadelic"
if (artist_name == "Parliament and Funkadelic") {
# Call get_artist_info("Parliament") and store the result
artist_info <- get_artist_info("Parliament")
} else {
# Call get_artist_info() with the current artist name and store the result
artist_info <- get_artist_info(artist_name)
}
# Add the rank to the artist_info data frame
artist_info$rank <- artist_rank
# Add the result to the list
artist_info_list[[i]] <- artist_info
}
# Convert the list of data frames to a single data frame
greatest_100_info <- do.call(rbind, artist_info_list)
# Group genres under main headings
greatest_100_info <- greatest_100_info %>%
mutate(genres = case_when(
grepl("hip-hop", tolower(genres)) | grepl("hip hop", tolower(genres)) | grepl("rap", tolower(genres)) ~ "rap",
grepl("rock", tolower(genres)) | grepl("punk", tolower(genres)) ~ "rock",
grepl("blues", tolower(genres)) ~ "blues",
grepl("soul", tolower(genres)) | grepl("motown", tolower(genres)) ~ "soul",
grepl("pop", tolower(genres)) | grepl("reggae", tolower(genres)) ~ "pop",
grepl("country", tolower(genres)) ~ "country",
grepl("reggae", tolower(genres)) ~ "reggae",
TRUE ~ genres
)) %>%
mutate(genres = ifelse(name == "Muddy Waters", "blues", genres),
genres = ifelse(name == "The Drifters", "blues", genres),
genres = ifelse(name == "Fats Domino", "blues", genres),
genres = ifelse(name == "The Shirelles", "soul", genres),
genres = ifelse(name == "Jackie Wilson", "soul", genres),
genres = ifelse(name == "Michael Jackson", "pop", genres))
greatest_100_info$popularity <- as.numeric(greatest_100_info$popularity)
# Create the bubble chart
popularity_vs_genre <- plot_ly(
data = greatest_100_info,
x = ~rank,
y = ~popularity,
type = "scatter",
text = ~name,
color = ~genres,
mode = "markers",
size = ~followers_total,
fill = ~'',
marker = list(sizemode = "diameter", opacity = 0.7),
colors = viridis::viridis(20)
) %>%
layout(
title = "Artist Popularity and Follower Counts",
xaxis = list(
title = "Rolling Stone Ranking",
range = c(-5, max(greatest_100_info$rank) + 5),
zeroline = FALSE
),
yaxis = list(
title = "Popularity Score",
range = c(0, max(greatest_100_info$popularity) + 15),tickmode = "linear",
dtick = 10
),
showlegend = TRUE
)
# Show the bubble chart
popularity_vs_genre
# Function to navigate to the URL and accept cookies
#navigate_and_accept_cookies <- function() {
# # Initialize the RSelenium driver
# rD <- rsDriver(browser = "firefox", verbose = FALSE, port = netstat::free_port(random = TRUE), chromever = NULL)
# driver <- rD[["client"]]
# # Navigate to the AllMusic search page
# driver$navigate("https://www.allmusic.com/search/all/")
# # Wait for 1 second before clicking the "AGREE" button
# Sys.sleep(1)
# agree_button <- driver$findElement(using = "css selector", value = ".fc-cta-consent")
# # Click the "AGREE" button
# agree_button$clickElement()
# return(driver)
#}
# Function to scrape artist decades
#scrape_artist_decades <- function(driver, artist_name) {
# # Find the search input field by its CSS selector and enter the artist name
# search_input <- driver$findElement(using = "css selector", value = ".siteSearchInput")
# search_input$sendKeysToElement(list(artist_name))
# # Find and click the search button
# search_button <- driver$findElement(using = "css selector", value = ".siteSearchButton")
# search_button$clickElement()
# if (artist_name == "Diana Ross & The Supremes") {
# # Find and click the specified artist filter
# element_to_click <- driver$findElement(using = "css selector", value = ".filter > div:nth-child(2) > h3:nth-child(1) > a:nth-child(1)")
# element_to_click$clickElement()
# }
# # Find the artist info section by its CSS selector
# artist_info <- driver$findElement(using = "css selector", value = "div.artist:nth-child(2) > div:nth-child(1)")
# # Extract the decades
# decades <- artist_info$findElement(using = "css selector", value = "div.artist:nth-child(2) > div:nth-child(1) > div:nth-child(4)")
# decades_text <- decades$getElementText()[[1]] %>%
# str_replace("^.*\\n", "")
# return(decades_text)
#}
# Initialize the RSelenium driver and accept cookies
#driver <- navigate_and_accept_cookies()
# List of artist names
#artist_names <- greatest_100_info$name
# Create an empty tibble to store the data
#artists_decades <- tibble(
# name = character(0),
# decade = character(0)
#)
#
# Loop through artist names and scrape their decades
#for (artist_name in artist_names) {
# decades <- scrape_artist_decades(driver, artist_name)
# artists_decades <- artists_decades %>%
# add_row(name = artist_name, decade = decades)
#}
# Close the driver when done
#driver$close()
#artists_decades <- artists_decades %>%
# mutate(decade = str_extract(decade, "\\d{4}s"))
# Write artists_decades to the csv
# write.csv(artists_decades, "artist_decades.csv")
# Read artists_decades
artists_decades <- read.csv("artist_decades.csv")
greatest_100_info <- greatest_100_info %>%
left_join(select(artists_decades, name, decade), by = "name")
popularity_vs_decade <- plot_ly(data = greatest_100_info, x = ~decade, y = ~popularity) %>%
add_boxplot(marker = list(color = 'lightgray'), showlegend = FALSE) %>%
add_trace(x = ~decade, y = ~popularity, type = 'scatter', mode = 'markers',
marker = list(size = 10), text = ~name, hoverinfo = 'text', name = "Artists") %>%
layout(title = "Popularity Distribution by Decade",
xaxis = list(title = "Decade"),
yaxis = list(title = "Popularity"))
popularity_vs_decade
# Calculate mean popularity for each decade
mean_popularity_by_decade <- greatest_100_info %>%
group_by(decade) %>%
summarize(mean_popularity = mean(popularity, na.rm = TRUE))
# To be commented
# unique_common_artists <- unique_common_artists[unique_common_artists != "Neil Young"]
# Create an empty data frame to store the mean values
# mean_values_df <- data.frame()
# Loop through unique_common_artists and calculate mean values
# for (i in seq_along(unique_common_artists)) {
# artist_id <- get_artist_id(unique_common_artists[i])
# if (unique_common_artists[i] != '6v8FB84lnmJs434UJf2Mrm') {
# artist_features <- spotifyr::get_artist_audio_features(artist_id) %>%
# select(artist_name, danceability, energy, loudness, speechiness, acousticness, instrumentalness, tempo)
# mean_values <- artist_features %>%
# summarise(across(where(is.numeric), mean))
# # Add artist_name to mean_values
# mean_values$artist_name <- unique_common_artists[i]
# # Bind the result to the mean_values_df
# mean_values_df <- bind_rows(mean_values_df, mean_values)}
# }
#neil_tracks <- get_artist_top_tracks(get_artist_id("neil young"))
#for (id in neil_tracks$id){
# artist_feature <- get_track_audio_features(id) %>%
# select(danceability, energy, loudness, speechiness, acousticness, instrumentalness, tempo)
# mean_values <- artist_feature %>%
# summarise(across(where(is.numeric), mean))
# mean_values$artist_name <- "Neil Young"
# }
# mean_full <- bind_rows(mean_values_df, mean_values)
# features_vs_popularity <- left_join(mean_full, select(greatest_100_info, name, popularity), by = c("artist_name" = "name"))
# Write features_vs_popularity to a CSV file
# write.csv(features_vs_popularity, "features_vs_popularity.csv", row.names = FALSE)
#not_in_billboard <- greatest_100_info$name[!(greatest_100_info$name %in% unique_common_artists)]
# Create an empty data frame to store the mean values
#mean_values_df <- data.frame()
# Loop through not_in_billboard and calculate mean values
#for (i in seq_along(not_in_billboard)) {
#artist_id <- get_artist_id(not_in_billboard[i])
# if (not_in_billboard[i] != '6v8FB84lnmJs434UJf2Mrm') {
# artist_features <- spotifyr::get_artist_audio_features(artist_id) %>%
# select(artist_name, danceability, energy, loudness, speechiness, acousticness, instrumentalness, tempo)
# mean_values <- artist_features %>%
# summarise(across(where(is.numeric), mean))
# Add artist_name to mean_values
# mean_values$artist_name <- not_in_billboard[i]
# Bind the result to the mean_values_df
# mean_values_df <- bind_rows(mean_values_df, mean_values)
# }}
#not_in_billboard_features <- left_join(mean_values_df, select(greatest_100_info, name, popularity), by = c("artist_name" = "name"))
#not_in_billboard_features$genre <- greatest_100_info$genres[greatest_100_info$name %in% not_in_billboard_features$artist_name]
#not_in_billboard_features <- not_in_billboard_features %>%
#select("artist_name", "popularity", "genre", everything())
#features_vs_popularity$genre <- greatest_100_info$genres[greatest_100_info$name %in% features_vs_popularity$artist_name]
#features_vs_popularity <- features_vs_popularity %>%
#select("artist_name", "popularity", "genre", everything())
#write.csv(not_in_billboard_features, "not_in_billboard_features.csv", row.names = FALSE)
# Read the CSV files
in_billboard_features <- read.csv("features_vs_popularity.csv")
in_billboard_features$genre <- greatest_100_info$genres[greatest_100_info$name %in% in_billboard_features$artist_name]
in_billboard_features <- in_billboard_features %>%
select("artist_name", "popularity", "genre", everything())
not_in_billboard_features <- read.csv("not_in_billboard_features.csv")
# Drop 'genre' column from both data frames
in_billboard_features <- in_billboard_features[, !(colnames(in_billboard_features) %in% c("genre", "X"))]
not_in_billboard_features <- not_in_billboard_features[, !colnames(not_in_billboard_features) %in% "genre"]
features_vs_popularity <- rbind(in_billboard_features, not_in_billboard_features)
# Function to create a scatter plot with polynomial regression
create_feature_plot <- function(data, feature, title_suffix) {
# Sort the data by the specified feature
sorted_data <- data[order(data[[feature]]), ]
# Create a scatter plot for the specified feature
feature_plot <- plot_ly(sorted_data,
x = ~get(feature),
y = ~popularity,
text = ~artist_name,
type = "scatter",
mode = "markers",
marker = list(size = 10),
name = "Actual",
showlegend = FALSE)
# Fit a quadratic polynomial regression model for the specified feature
poly_model_feature <- lm(popularity ~ poly(sorted_data[[feature]], 2), data = sorted_data)
# Generate predicted values from the polynomial model for the specified feature
predicted_values_feature <- predict(poly_model_feature, data.frame(energy = sorted_data[[feature]]))
# Add polynomial regression line to the plot for the specified feature
feature_plot <- feature_plot %>%
add_trace(x = sorted_data[[feature]],
y = predicted_values_feature,
type = "scatter",
mode = 'lines+markers',
line = list(color = "blue"),
name = paste("Predicted", tools::toTitleCase(feature)))
# Customize layout for the plot
layout <- list(xaxis = list(title = tools::toTitleCase(feature)),
yaxis = list(title = "Popularity"))
# Apply the layout
feature_plot <- feature_plot %>% layout(layout)
return(feature_plot)
}
# Create plot the "speechiness" feature
speechiness_vs_pop <- create_feature_plot(features_vs_popularity, "speechiness")%>% layout(title = "Popularity Trends Across Song Speechiness Level" , xaxis = list(title = "Speechiness"), yaxis = list(title = "Popularity"))
# Show the plot
speechiness_vs_pop
# Create plot the "tempo" feature
tempo_vs_pop <- create_feature_plot(features_vs_popularity, "tempo")%>% layout(title = "Popularity Trends Across Song Tempo Level" , xaxis = list(title = "Tempo"), yaxis = list(title = "Popularity"))
# Show the plot
tempo_vs_pop
# Create plot the "energy" feature
energy_vs_pop <- create_feature_plot(features_vs_popularity, "energy", "Billboard") %>% layout(title = "Popularity Trends Across Song Energy Levels" , xaxis = list(title = "Energy"), yaxis = list(title = "Popularity"))
energy_vs_pop
# Create plot for the "danceability" feature
danceability_vs_pop <- create_feature_plot(features_vs_popularity, "danceability") %>%
layout(title = "Popularity Trends Across Song Danceability Levels",
xaxis = list(title = "Danceability"),
yaxis = list(title = "Popularity"))
danceability_vs_pop
# Function to retrieve Billboard data
get_billboard_data <- function(url) {
# Start the Selenium server
rD <- rsDriver(browser = "firefox", verbose = FALSE, port = netstat::free_port(random = TRUE), chromever = NULL)
driver <- rD[["client"]]
# Specify the URL of the Billboard Artist 100 chart
driver$navigate(url)
# Wait for the page to load
Sys.sleep(1)
# Extract the HTML content of the entire page
page_source <- driver$getPageSource()
# Parse the HTML content with rvest
page <- read_html(page_source[[1]])
# Create a sequence from 1 to 100
ranks <- seq(1, 100)
# Extract the names
names <- page %>%
html_nodes("li.o-chart-results-list__item h3.c-title") %>%
html_text() %>%
gsub("^\\s+|\\s+$", "", .)
# Create a data frame to store the data
billboard_data <- data.frame(Rank = ranks, name = names)
# Close the Selenium connection
rD[["server"]]$stop()
return(billboard_data)}
# Specify the base URL
base_url <- "https://www.billboard.com/charts/artist-100/2023-"
# Define the months
months <- c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12")
# Initialize a list to store the data for each month
billboard_data_list <- list()
# Retrieve Billboard data for each month
for (month in months) {
# Construct the URL for the current month
url <- paste0(base_url, month, "-23/")
# Retrieve Billboard data for the current month
billboard_data <- get_billboard_data(url)
# Store the data in the list
billboard_data_list[[month]] <- billboard_data
}
# Function to compare common artists
compare_common_artists <- function(greatest_100, billboard_data) {
# Find common artists
common_artists <- greatest_100$name[greatest_100$name %in% billboard_data$name]
return(common_artists)
}
# Initialize an empty list to store common artists for each month
common_artists_list <- list()
# Compare common artists for each month
for (i in seq_along(billboard_data_list)) {
common_artists <- compare_common_artists(greatest_100, billboard_data_list[[i]])
common_artists_list[[i]] <- common_artists
}
# Count the number of common artists for each month
common_counts <- sapply(common_artists_list, length)
# Create a data frame for plotting
billboard_artist_counts <- data.frame(Month = month.name, Count = common_counts)
# Create an empty list to store hover text
hover_text <- list()
# Loop through each month and update the hover text
for (i in seq_along(common_artists_list)) {
hover_text[[i]] <- paste("Common Artists:\n", paste(common_artists_list[[i]], collapse = "\n"))
}
# Create an interactive line graph
greatest_vs_billboard <- plot_ly(billboard_artist_counts , x = ~Month, y = ~Count, type = 'scatter', mode = 'lines+markers',
marker = list(size = 10, color = 'rgba(219, 38, 38, 0.7)'),
text = hover_text,
hoverinfo = 'text') %>% layout(
hovermode = 'closest',
showlegend = FALSE,
hoverlabel = list(bgcolor = 'white'),
xaxis = list(categoryorder = "array", categoryarray = month.name),
yaxis = list(range = c(0, max(common_counts) + 1)),
title = list(text = "The Artists who entered the Billboard Chart in 2023", font = list(size = 16), yanchor = "top", y = 0.95)
)
greatest_vs_billboard
# Initialize an empty vector to store unique common artists
unique_common_artists <- character(0)
#Loop through each month and update the vector
for (i in seq_along(billboard_data_list)) {
common_artists_month <- compare_common_artists(greatest_100, billboard_data_list[[i]])
unique_common_artists <- union(unique_common_artists, common_artists_month)
}
# Convert to a data frame
billboard_artists <- data.frame(artist = unique_common_artists, stringsAsFactors = FALSE) %>%
left_join(greatest_100_info, by = c("artist" = "name"))
decade_counts <- billboard_artists %>%
group_by(decade) %>%
summarize(count = n())
genre_counts_billboard <- billboard_artists %>%
group_by(genres) %>%
summarise(count = n())